Skip to content

feat(megatron): migrate profiler args in core workflow#573

Merged
wenxie-amd merged 2 commits intomainfrom
feat/megatron/torch_profiler
Mar 3, 2026
Merged

feat(megatron): migrate profiler args in core workflow#573
wenxie-amd merged 2 commits intomainfrom
feat/megatron/torch_profiler

Conversation

@HuangWei-95
Copy link
Collaborator

feat(megatron): migrate profiler args in core workflow

HuangWei-95 added 2 commits March 3, 2026 09:47
### Changes:
- Introduced a new file `torch_profiler_patchers.py` that patches `torch.profiler.profile` to integrate Primus-specific options during training.
- Implemented logic to check if the profiler is called from `megatron.training.train` and create a profiler with appropriate settings.
- Added error handling and logging for better debugging.

### Reason for changes:
This patch enhances the profiling capabilities of Megatron by allowing the use of Primus options, improving performance monitoring during training sessions.
- Improved error handling and logging in `torch_profiler_patchers.py` to provide clearer debugging information when issues arise during profiling.
- Refined logic to ensure compatibility with various training scenarios in Megatron.

These enhancements aim to facilitate better debugging and monitoring of profiling issues, ultimately improving the user experience when utilizing Primus options in Megatron's training process.
@HuangWei-95 HuangWei-95 force-pushed the feat/megatron/torch_profiler branch from 93675d7 to 2a37626 Compare March 3, 2026 01:59
@wenxie-amd wenxie-amd merged commit 27571ee into main Mar 3, 2026
10 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants